Malware family classification via efficient Huffman features
نویسندگان
چکیده
As malware evolves and becomes more complex, researchers strive to develop detection classification schemes that abstract away from the internal intricacies of binary code represent without need for architectural knowledge or invasive analysis procedures. Such approaches can reduce complexities feature generation simplify process. In this paper, we present efficient Huffman features (eHf), a novel compression-based approach construction, based on encoding, where are represented in compact format, intrusive reverse-engineering dynamic processes. We demonstrate viability eHf as solution classifying into their respective families large corpus 15 k samples, indicative current threat landscape. evaluate against alternatives show our method is comparable superior accuracy, while exhibiting considerably greater runtime efficiency. Finally resilient reordering obfuscation.
منابع مشابه
Exploring Discriminatory Features for Automated Malware Classification
The ever-growing malware threat in the cyber space calls for techniques that are more effective than widely deployed signature-based detection systems and more scalable than manual reverse engineering by forensic experts. To counter large volumes of malware variants, machine learning techniques have been applied recently for automated malware classification. Despite the successes made from thes...
متن کاملEfficient Classification of Android Malware in the wild using Robust Static Features
The ubiquitous use of Android smartphones continue to threaten the security and privacy of users’ personal information. Its fast adoption rate makes the smartphone an interesting target for malware authors to deploy new attacks and infect millions of devices. Moreover, the growing number and diversity of malicious applications render conventional defenses ineffective. Thus, there is a need to n...
متن کاملMalware Detection using Classification of Variable-Length Sequences
In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. Acco...
متن کاملMicrosoft Malware Classification Challenge
The Microsoft Malware Classification Challenge was announced in 2015 along with a publication of a huge dataset of nearly 0.5 terabytes, consisting of disassembly and bytecode of more than 20K malware samples. Apart from serving in the Kaggle competition, the dataset has become a standard benchmark for research on modeling malware behaviour. To date, the dataset has been cited in more than 50 r...
متن کاملAn efficient decoding technique for Huffman codes
We present a new data structure for Huffman coding in which in addition to sending symbols in order of their appearance in the Huffman tree one needs to send codes of all circular leaf nodes (nodes with two adjacent external nodes), the number of which is always bounded above by half the number of symbols. We decode the text by using the memory efficient data structure proposed by Chen et al. [...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Forensic Science International: Digital Investigation
سال: 2021
ISSN: ['2666-2825', '2666-2817']
DOI: https://doi.org/10.1016/j.fsidi.2021.301192